Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(coloctapp): dynamic backscraper #1011

Merged
merged 3 commits into from
Jul 8, 2024

Conversation

grossir
Copy link
Contributor

@grossir grossir commented Apr 19, 2024

Helps solve #979

Since old opinions for coloctapp are inside PDFs, this script scrapes new https://research.coloradojudicial.gov/ search interface, which has a vlex backend

Helps solve freelawproject#979

Since old opinions for coloctapp are inside PDFs, this script scrapes new https://research.coloradojudicial.gov/  search interface, which has a vlex backend
@quevon24 quevon24 self-requested a review June 1, 2024 01:13
Copy link
Member

@quevon24 quevon24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the command to fill the gaps is incorrect:

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.coloctapp --backscrape-start=09/28/2021 --backscrape-end=02/01/2022

the correct should be:

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states_backscrapers.state.coloctapp --backscrape --backscrape-start=09/28/2021 --backscrape-end=02/01/2022

I tried to run it several times but it always gives me the same error:

Traceback (most recent call last):
  File "/home/quevon24/PycharmProjects/juriscraper/sample_caller.py", line 253, in main
    site.parse()
  File "/home/quevon24/PycharmProjects/juriscraper/juriscraper/AbstractSite.py", line 145, in parse
    self.__setattr__(attr, getattr(self, f"_get_{attr}")())
  File "/home/quevon24/PycharmProjects/juriscraper/juriscraper/OpinionSiteLinear.py", line 29, in _get_case_dates
    return [convert_date_string(case["date"]) for case in self.cases]
  File "/home/quevon24/PycharmProjects/juriscraper/juriscraper/OpinionSiteLinear.py", line 29, in <listcomp>
    return [convert_date_string(case["date"]) for case in self.cases]
KeyError: 'date'

let me know if you want me to try anything special 👍

@grossir grossir force-pushed the coloctapp_dynamic_backscraper branch from 2aa6737 to 3c63257 Compare July 5, 2024 02:06
Colorado Courts have changed their site, the old site is no longer available and the scrapers won't work

Helps solve freelawproject#1062 and freelawproject#979
@grossir grossir force-pushed the coloctapp_dynamic_backscraper branch from 3c63257 to ea8924c Compare July 5, 2024 16:44
@grossir
Copy link
Contributor Author

grossir commented Jul 5, 2024

Old Colorado Courts sources no longer work, so this PR will solve both the gap issue and the new scraper needed. Please check this again @quevon24

@grossir grossir requested a review from quevon24 July 5, 2024 16:47
Copy link
Member

@quevon24 quevon24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good 👍

@quevon24 quevon24 merged commit ded1007 into freelawproject:main Jul 8, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants